## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.5 ✓ dplyr 1.0.3
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
Definitions:
*Cancer diagnosis information from HES, coded using ICD10 and ICD9 codes. Additional cancer information at baseline from Self-reported cancers.
##
## Attaching package: 'flextable'
## The following objects are masked from 'package:kableExtra':
##
## as_image, footnote
## The following object is masked from 'package:purrr':
##
## compose
| Table 1 | Case/control status | P values | |||
Characteristic | N | Control (n=449042)1 | Lung (n=1987)1 | Bladder (n=1724)1 | Control vs. Lung2 | Control vs. Bladder2 |
Age | 452,753 | 56.12 (8.11) | 61.62 (5.86) | 61.56 (6.14) | <0.001 | <0.001 |
Sex | 452,753 | <0.001 | <0.001 | |||
Female | 240,963 (54%) | 969 (49%) | 415 (24%) | |||
Male | 208,079 (46%) | 1,018 (51%) | 1,309 (76%) | |||
Ethnicity | 450,201 | <0.001 | <0.001 | |||
White | 421,004 (94%) | 1,916 (97%) | 1,661 (97%) | |||
Non-white | 25,510 (5.7%) | 58 (2.9%) | 52 (3.0%) | |||
Townsend deprivation index | 452,196 | -1.30 (3.09) | 0.08 (3.64) | -1.33 (3.07) | <0.001 | 0.7 |
Current employment status | 447,508 | <0.001 | <0.001 | |||
Employed | 264,840 (60%) | 658 (33%) | 717 (42%) | |||
Active | 27,098 (6.1%) | 85 (4.3%) | 84 (4.9%) | |||
Retired | 130,946 (30%) | 1,014 (52%) | 816 (48%) | |||
Unable to work | 13,479 (3.0%) | 154 (7.8%) | 73 (4.3%) | |||
Unemployed | 7,470 (1.7%) | 54 (2.7%) | 20 (1.2%) | |||
Education | 443,599 | <0.001 | <0.001 | |||
Intermediate | 220,578 (50%) | 877 (46%) | 817 (48%) | |||
Low | 73,643 (17%) | 737 (38%) | 446 (26%) | |||
High | 145,764 (33%) | 307 (16%) | 430 (25%) | |||
Type of accomodation lived in | 451,473 | <0.001 | 0.2 | |||
House | 400,409 (89%) | 1,622 (82%) | 1,538 (89%) | |||
Flat | 44,537 (9.9%) | 321 (16%) | 165 (9.6%) | |||
Other | 2,824 (0.6%) | 40 (2.0%) | 17 (1.0%) | |||
Own or rent accomodation lived in | 447,450 | <0.001 | 0.5 | |||
Own | 395,569 (89%) | 1,487 (76%) | 1,507 (88%) | |||
Rent | 41,245 (9.3%) | 442 (23%) | 172 (10%) | |||
Other | 6,982 (1.6%) | 21 (1.1%) | 25 (1.5%) | |||
Number of people in household | 448,693 | <0.001 | <0.001 | |||
2 | 203,994 (46%) | 1,005 (52%) | 970 (57%) | |||
1 | 81,453 (18%) | 544 (28%) | 342 (20%) | |||
3-4 | 134,223 (30%) | 349 (18%) | 355 (21%) | |||
≥5 | 25,369 (5.7%) | 51 (2.6%) | 38 (2.2%) | |||
Average total household income | 384,236 | <0.001 | <0.001 | |||
18,000-30,999 | 95,722 (25%) | 458 (29%) | 430 (29%) | |||
<18,000 | 84,060 (22%) | 726 (46%) | 441 (30%) | |||
31,000-51,999 | 100,586 (26%) | 255 (16%) | 350 (24%) | |||
>52,000 | 100,806 (26%) | 146 (9.2%) | 256 (17%) | |||
BMI | 449,917 | <0.001 | <0.001 | |||
Normal | 145,007 (32%) | 609 (31%) | 394 (23%) | |||
Underweight | 2,253 (0.5%) | 22 (1.1%) | 8 (0.5%) | |||
Pre-obesity | 189,924 (43%) | 814 (41%) | 811 (47%) | |||
Obesity class I | 78,095 (18%) | 389 (20%) | 358 (21%) | |||
Obesity class II | 22,314 (5.0%) | 85 (4.3%) | 109 (6.4%) | |||
Obesity class III | 8,654 (1.9%) | 43 (2.2%) | 28 (1.6%) | |||
Sleep per 24 hours (hours) | 448,951 | <0.001 | 0.017 | |||
7-8 | 301,971 (68%) | 1,175 (60%) | 1,114 (65%) | |||
≤6 | 110,107 (25%) | 561 (29%) | 447 (26%) | |||
≥9 | 33,204 (7.5%) | 219 (11%) | 153 (8.9%) | |||
Number of days per week spent 10+ mins walking | 445,131 | <0.001 | 0.001 | |||
0 | 10,945 (2.5%) | 96 (5.0%) | 67 (4.0%) | |||
1 | 12,271 (2.8%) | 54 (2.8%) | 36 (2.1%) | |||
2 | 27,042 (6.1%) | 101 (5.2%) | 90 (5.3%) | |||
3 | 35,151 (8.0%) | 138 (7.2%) | 126 (7.4%) | |||
4 | 35,713 (8.1%) | 146 (7.6%) | 137 (8.1%) | |||
5 | 71,871 (16%) | 253 (13%) | 250 (15%) | |||
6 | 44,872 (10%) | 176 (9.1%) | 173 (10%) | |||
7 | 203,647 (46%) | 963 (50%) | 813 (48%) | |||
Number of days per week spent 10+ mins doing moderate exercise | 428,501 | <0.001 | 0.3 | |||
0 | 54,268 (13%) | 300 (17%) | 220 (14%) | |||
1 | 34,769 (8.2%) | 89 (4.9%) | 134 (8.3%) | |||
2 | 62,587 (15%) | 219 (12%) | 233 (14%) | |||
3 | 64,102 (15%) | 257 (14%) | 215 (13%) | |||
4 | 42,246 (9.9%) | 165 (9.1%) | 158 (9.8%) | |||
5 | 64,163 (15%) | 266 (15%) | 233 (14%) | |||
6 | 23,733 (5.6%) | 105 (5.8%) | 102 (6.3%) | |||
7 | 79,210 (19%) | 404 (22%) | 323 (20%) | |||
Number of days per week spent 10+ mins doing vigorous exercise | 428,191 | <0.001 | <0.001 | |||
0 | 156,834 (37%) | 921 (52%) | 666 (41%) | |||
1 | 60,248 (14%) | 185 (10%) | 213 (13%) | |||
2 | 67,527 (16%) | 199 (11%) | 251 (15%) | |||
3 | 59,155 (14%) | 167 (9.4%) | 164 (10%) | |||
4 | 27,802 (6.5%) | 85 (4.8%) | 114 (7.0%) | |||
5 | 29,495 (6.9%) | 106 (6.0%) | 108 (6.7%) | |||
6 | 8,609 (2.0%) | 35 (2.0%) | 33 (2.0%) | |||
7 | 15,122 (3.6%) | 80 (4.5%) | 72 (4.4%) | |||
Processed meat intake | 450,677 | <0.001 | <0.001 | |||
Never | 41,791 (9.3%) | 151 (7.6%) | 100 (5.8%) | |||
Less than once a week | 135,473 (30%) | 510 (26%) | 463 (27%) | |||
Once a week | 130,337 (29%) | 568 (29%) | 532 (31%) | |||
More than once a week | 139,384 (31%) | 747 (38%) | 621 (36%) | |||
Oily fish intake | 448,986 | <0.001 | 0.029 | |||
Never | 49,715 (11%) | 265 (13%) | 156 (9.2%) | |||
Less than once a week | 148,546 (33%) | 614 (31%) | 563 (33%) | |||
Once a week | 167,554 (38%) | 700 (36%) | 649 (38%) | |||
More than once a week | 79,501 (18%) | 390 (20%) | 333 (20%) | |||
Non oily fish intake | 449,337 | 0.2 | 0.012 | |||
Never | 21,314 (4.8%) | 101 (5.1%) | 73 (4.3%) | |||
Less than once a week | 130,302 (29%) | 534 (27%) | 454 (27%) | |||
Once a week | 221,148 (50%) | 999 (51%) | 915 (54%) | |||
More than once a week | 72,897 (16%) | 333 (17%) | 267 (16%) | |||
Poultry intake | 450,839 | <0.001 | <0.001 | |||
Never | 23,122 (5.2%) | 92 (4.7%) | 58 (3.4%) | |||
Less than once a week | 47,778 (11%) | 267 (14%) | 205 (12%) | |||
Once a week | 159,828 (36%) | 780 (40%) | 698 (41%) | |||
More than once a week | 216,425 (48%) | 833 (42%) | 753 (44%) | |||
Pork intake | 448,824 | <0.001 | <0.001 | |||
Never | 77,785 (17%) | 310 (16%) | 220 (13%) | |||
Less than once a week | 252,816 (57%) | 1,019 (52%) | 937 (55%) | |||
Once a week | 98,570 (22%) | 523 (27%) | 478 (28%) | |||
More than once a week | 15,998 (3.6%) | 101 (5.2%) | 67 (3.9%) | |||
Beef intake | 449,710 | <0.001 | <0.001 | |||
Never | 50,060 (11%) | 182 (9.2%) | 142 (8.3%) | |||
Less than once a week | 202,886 (45%) | 830 (42%) | 733 (43%) | |||
Once a week | 141,714 (32%) | 663 (34%) | 628 (37%) | |||
More than once a week | 51,363 (12%) | 293 (15%) | 216 (13%) | |||
Lamb intake | 448,631 | <0.001 | <0.001 | |||
Never | 79,728 (18%) | 334 (17%) | 250 (15%) | |||
Less than once a week | 252,464 (57%) | 962 (49%) | 928 (54%) | |||
Once a week | 99,098 (22%) | 552 (28%) | 463 (27%) | |||
More than once a week | 13,690 (3.1%) | 100 (5.1%) | 62 (3.6%) | |||
Salt added to food | 451,698 | <0.001 | <0.001 | |||
Never/rarely | 249,063 (56%) | 883 (45%) | 883 (51%) | |||
Sometimes | 125,605 (28%) | 582 (29%) | 490 (28%) | |||
Usually | 51,740 (12%) | 308 (16%) | 248 (14%) | |||
Always | 21,588 (4.8%) | 208 (10%) | 100 (5.8%) | |||
Tea intake per day (cups) | 451,657 | <0.001 | 0.010 | |||
0 | 65,823 (15%) | 375 (19%) | 236 (14%) | |||
≥1 | 52,145 (12%) | 192 (9.7%) | 206 (12%) | |||
2-3 | 131,685 (29%) | 431 (22%) | 455 (26%) | |||
≥4 | 198,305 (44%) | 981 (50%) | 823 (48%) | |||
Coffee intake per day (cups) | 451,534 | <0.001 | 0.2 | |||
0 | 99,520 (22%) | 484 (24%) | 351 (20%) | |||
≥1 | 121,637 (27%) | 401 (20%) | 469 (27%) | |||
2-3 | 138,412 (31%) | 498 (25%) | 535 (31%) | |||
≥4 | 88,265 (20%) | 597 (30%) | 365 (21%) | |||
Water intake per day (glasses) | 451,656 | <0.001 | <0.001 | |||
0 | 36,002 (8.0%) | 254 (13%) | 190 (11%) | |||
≥1 | 110,002 (25%) | 504 (25%) | 469 (27%) | |||
2-3 | 168,399 (38%) | 746 (38%) | 681 (40%) | |||
≥4 | 133,554 (30%) | 475 (24%) | 380 (22%) | |||
Alcohol intake frequency | 451,356 | <0.001 | <0.001 | |||
Never/rarely | 87,068 (19%) | 513 (26%) | 274 (16%) | |||
Occasionally | 166,211 (37%) | 637 (32%) | 577 (34%) | |||
Regularly | 194,383 (43%) | 825 (42%) | 868 (50%) | |||
Smoking status | 446,624 | <0.001 | <0.001 | |||
Never smoker - No smoker in household | 224,753 (51%) | 252 (13%) | 529 (31%) | |||
Never smoker - Yes, smoker in household | 21,466 (4.8%) | 24 (1.2%) | 43 (2.5%) | |||
Previous smoker - No smoker in household | 133,407 (30%) | 776 (40%) | 726 (43%) | |||
Previous smoker - Yes, smoker in household | 17,073 (3.9%) | 94 (4.8%) | 98 (5.8%) | |||
Current smoker | 46,281 (10%) | 800 (41%) | 302 (18%) | |||
Maternal smoking around birth | 390,442 | 112,784 (29%) | 615 (38%) | 444 (30%) | <0.001 | 0.5 |
NO2 (??g/m3) | 446,097 | 26.73 (7.58) | 27.95 (7.58) | 26.45 (7.68) | <0.001 | 0.13 |
NOx (??g/m3) | 446,097 | 44.14 (15.52) | 46.71 (16.02) | 44.28 (15.97) | <0.001 | 0.7 |
PM10 (??g/m3) | 415,390 | 16.24 (1.90) | 16.40 (1.87) | 16.16 (1.89) | <0.001 | 0.071 |
PM2.5 (absorbance/m) | 415,390 | 1.19 (0.27) | 1.21 (0.28) | 1.17 (0.27) | <0.001 | 0.017 |
PM2.5 (??g/m3) | 415,390 | 9.99 (1.06) | 10.20 (1.11) | 9.97 (1.05) | <0.001 | 0.4 |
PM2.5-10??m (??g/m3) | 415,390 | 6.43 (0.90) | 6.45 (0.90) | 6.41 (0.87) | 0.3 | 0.3 |
Number of medications | 451,933 | <0.001 | <0.001 | |||
0 | 127,761 (29%) | 293 (15%) | 351 (20%) | |||
1 | 85,684 (19%) | 275 (14%) | 265 (15%) | |||
>1 | 234,789 (52%) | 1,413 (71%) | 1,102 (64%) | |||
Parental history of COPD | 446,744 | 63,000 (14%) | 428 (22%) | 303 (18%) | <0.001 | <0.001 |
Parental history of diabetes | 446,744 | 77,822 (18%) | 266 (14%) | 269 (16%) | <0.001 | 0.074 |
Parental history of hypertension | 446,744 | 185,376 (42%) | 587 (30%) | 584 (34%) | <0.001 | <0.001 |
Parental history of stroke | 446,744 | 109,584 (25%) | 504 (26%) | 448 (26%) | 0.15 | 0.11 |
Parental history of heart disease | 446,744 | 177,952 (40%) | 780 (41%) | 712 (42%) | 0.8 | 0.12 |
Parental history of breast cancer | 442,332 | 32,279 (7.4%) | 121 (6.4%) | 128 (7.7%) | 0.11 | 0.7 |
Parental history of bowel cancer | 442,332 | 41,372 (9.4%) | 192 (10%) | 175 (10%) | 0.3 | 0.2 |
Parental history of lung cancer | 442,332 | 48,626 (11%) | 363 (19%) | 202 (12%) | <0.001 | 0.2 |
Parental history of prostate cancer | 442,332 | 29,271 (6.7%) | 88 (4.6%) | 98 (5.9%) | <0.001 | 0.2 |
Cardiovascular disease | 452,753 | 51,508 (11%) | 508 (26%) | 362 (21%) | <0.001 | <0.001 |
Hypertension | 452,753 | 120,527 (27%) | 790 (40%) | 684 (40%) | <0.001 | <0.001 |
Diabetes | 452,753 | 22,456 (5.0%) | 181 (9.1%) | 192 (11%) | <0.001 | <0.001 |
Respiratory disease | 452,753 | 72,417 (16%) | 558 (28%) | 300 (17%) | <0.001 | 0.2 |
Autoimmune disease | 452,753 | 49,706 (11%) | 282 (14%) | 189 (11%) | <0.001 | >0.9 |
1Mean (SD); n (%) | ||||||
2Student t-test for continuous, Chi-squared test for categorical | ||||||
Manhattan
Manhattan
P Values
P Values
P Values
P Values
P Values
P Values
Forest
Forest
Forest
Forest
Age at Diagnosis Analysis
Time to Diagnosis Analysis
Models:
Denoised using linear regression and logistic regression for continuous and categorical variables, respectively. One-hot encoding used for categorical variables with more than 2 levels.
Additionally, models with forced confounders were run to check for any biase in the denoised datasets.
Four models run for each outcome (lung/bladder cancer):
Lung: Mean Odds Ratio
Bladder: Mean Odds Ratio
Lung: Selection Proportion
Bladder: Selection Proportion
Lung: Base model AUC
Lung: Adjusted model AUC
Bladder: Base model AUC
Bladder: Adjusted model AUC
Lung: Mean Odds Ratio
Bladder: Mean Odds Ratio
Lung: Selection Proportion
Bladder: Selection Proportion
Stability analyses for sPLS on lung adjusted for age, sex and BMI
Stability analyses for sPLS on lung adjusted for age, sex, BMI and smoking
Stability selection for sPLS on lung adjusted for age, sex, and BMI
Selection proportion for sPLS on lung adjusted for age, sex, and BMI
Use results from stability selection for sPLS, lambda = 36
Loading coefficients from sPLS on lung adjusted for age, sex, and BMI
Stability selection for sPLS on lung adjusted for age, sex, BMI and smoking
Selection proportion for sPLS on lung adjusted for age, sex, BMI and smoking
Use results from stability selection for sPLS, lambda = 38
Loading coefficients from sPLS on lung adjusted for age, sex, BMI and smoking
Stability analyses for sPLS on bladder adjusted for age, sex and BMI
Stability analysis for sPLS on bladder adjusted for age, sex, BMI and smoking
Stability selection for sPLS on bladder adjusted for age, sex, and BMI
Selection proportion for sPLS on bladder adjusted for age, sex, and BMI
Use results from stability selection for sPLS, lambda = 22
Loading coefficients from sPLS on bladder adjusted for age, sex, and BMI
Stability selection for sPLS on bladder adjusted for age, sex, BMI and smoking
Selection proportion for sPLS on bladder adjusted for age, sex, BMI and smoking
Use results from stability selection for sPLS, lambda = 26
Loading coefficients from sPLS on bladder adjusted for age, sex, BMI and smoking
Lung cancer:
Bladder cancer:
Key points:
Questions
Report (Results section) outline:
Next steps are in bold